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Abstract 

Motivated by problems arising in random sampling of trigonometric poly- 
nomials, we derive exponential inequalities for the operator norm of the dif- 
ference between the sample second moment matrix n~ 1 U*U and its expec- 
tation where U is a complex random n x D matrix with independent rows. 
These results immediately imply deviation inequalities for the largest (small- 
est) eigenvalues of the sample second moment matrix, which in turn lead to 
results on the condition number of the sample second moment matrix. We 
also show that trigonometric polynomials in several variables can be learned 
from const ■ DlnD random samples. 

Keywords: eigenvalues; exponential inequality; learning theory; random 
matrix; random sampling; trigonometric polynomial. 

1 Introduction 

Let U be a complex random n x D matrix with independent rows. The matrix of 
(non-centered) sample second moments is then given by n~ 1 U*U . We provide expo- 
nential probability inequalities for the operator norm of the difference between the 
sample second moment matrix and its expectation. These results immediately imply 
deviation inequalities for the largest (smallest) eigenvalues of the sample second mo- 
ment matrix. As a consequence we obtain probability inequalities for the condition 
number of the sample second moment matrix. Sample second moment matrices arise 
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as central objects of interest in many areas, such as multivariate analysis, stochastic 
linear regression, time series analysis, and learning theory. 

Our motivation comes from learning theory and, in particular, from random 
sampling of trigonometric polynomials. Random sampling is a strategy of choice for 
learning an unknown function in a given class of functions. This idea is predomi- 
nant in the version of learning theory and sampling theory by Cucker, Smale, and 
Zhou [H [15] . In Bass and Grochenig pQ the randomization of the samples was used 
for the justification of numerical algorithms. Random sampling and random mea- 
surements are central in the emerging field of sparse reconstruction, also referred to 
as compressed sensing j3j HJ El El Ell El 02] • 

In this paper, we revisit the random sampling of trigonometric polynomials with 
a given degree or support, which was first studied in pQ. We first review and supple- 
ment the probability inequalities for the condition number of the associated Fourier 
sample second moment matrix in [1] (Section [2]). In Section [3] we replace Fourier 
matrices by general random matrices with independent rows and derive probability 
estimates for sample second moment matrix obtained from general random matri- 
ces U . Our main result is an exponential probability inequality for the condition 
number of the sample second moment matrix for a vast class of random matrices. 
These include random matrices with independent identically distributed (i.i.d.) rows 
and bounded entries. The boundedness assumption on the entries can be relaxed 
to the existence of finite moment generating functions. Our proof is much simpler 
than the one in [I] and allows us to incorporate the case of random matrices with 
independent, but not necessarily identically distributed rows. This rather technical 
extension is treated in Section 13.21 

A further feature of our results is that all constants are given explicitly as a 
function of the parameters that describe the distribution of the random matrices. 
The explicit form of the constants is important to determine the sample size for 
which the condition number of the sample second moment matrix is small with high 
( "overwhelming" ) probability. 

Mendelson and Pajor [H] have recently found a related, beautiful and deep in- 
equality for the sample second moment matrix of random matrices. While in the 
same spirit, their assumptions, methods, and conclusions are different, and so their 
and our results are not directly comparable in that neither result implies the other. 
The inequality of [9] involves an unspecified absolute constant so that the main 
contribution is an asymptotic bound for the condition number as the sample size n 
tends to infinity. A detailed comparison between [5] and our result will be given in 
Section 3.1.1. 

In learning theory one is often interested in the efficiency of the sampling pro- 
cedure, i.e., in Cucker and Smale's words [7j, "how many random samples do we 
need to assert, with confidence 1 — 5, that the condition number does not exceed 
a given threshold." For random sampling of trigonometric polynomials in several 
variables inspection shows that the probability inequalities in [1], as well as the ones 
in Section [3] of the present paper, lead to lower bounds for the required sample size 
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that are typically of the order D 2 lnD. We show in Section f3 . 1 . 1 1 how the result in 
[9] can be used to improve this order to DlnD. In Section H] this result is further 
improved by using the method developed in [10] (after inspiration from [3]). To put 
it more casually, these results show that we need const -DlnD random samples to 
learn a trigonometric polynomial taken from a D-dimensional space. This seems to 
be the optimal order that can be expected in a probabilistic setting. 

Notation. By || • || 2 we denote the usual Euclidean norm on C D . For a (her- 
mitian) matrix A we denote by A max (A) and X m i n (A) the maximal and minimal 
eigenvalues of A. The condition number of A is then given by \ m &x(A) / \ min (A) . 
For a matrix A its transpose is denote by A' and its conjugate-transpose by A*. 
The operator norm of a matrix is \\A\\ = X maiX (A*AY^ 2 . By P we denote the prob- 
ability measure on the probability space supporting all the random variables used 
subsequently, and E denotes the corresponding expectation operator. 

2 Random Sampling of Trigonometric Polynomi- 
als 

Let T be a (non-empty) finite subset of 7L d . By Vr we denote the space of all 
trigonometric polynomials in dimension d with coefficients supported on V. Such a 
polynomial has the form 

f(x) =5> fc e 2 ^, xe[0,l] d 

with coefficients a k G C. If T = {— m, —m + 1, ...,m — l,m} d , then Vr is the 
space of all trigonometric polynomials of degree at most m. We let D — \T\ be the 
dimension of Vr- 

Let xi, . . . , x n G [0, l] d . We are interested in the reconstruction of a trigonometric 
polynomial / from its sample values f(xi), . . . , f(x n ). Let y = (/(xi), . . . , f(x n ))' 
be the vector of sampled values of / and let U be the n x D matrix with entries 

u tk = e^ ik - x \ keT, t=l,...,n. (2.1) 

The reconstruction of / amounts to solving the linear system 

Ua = y 

for the coefficient vector a = (a^ker- Alternatively, one may try to solve the normal 
equation 

U*Ua = U*y. 

We note that the invertibility of U*U is equivalent to the sampling inequalities 

n 

A\\f\\l < ]T \f(x t )\ 2 = a*U*Ua < B\\f\\ 2 2 for all/ G V v , (2.2) 
t=l 
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for some positive real numbers A and B, and that the condition number of U*U is 
bounded by B/A. 

In the spirit of learning theory, one assumes that the sampling points are taken 
at random. Then the matrix U*U is a random matrix depending on the sampling 
points (x t ). Several questions arise: 

1. Determine the probability that U*U is invertible. 

2. Determine the probability that the condition number of U*U does not exceed 
a given threshold. 

3. Determine the number of random samples required to achieve such estimates. 
This is the effectivity problem for random sampling. 

For trigonometric polynomials Question 1 has been answered in [H Thm. 1.1]: If 
the Xt are i.i.d. with a distribution that is absolutely continuous with respect to the 
Lebesgue-measure on [0, l] d , then U*U is invertible almost surely provided n > D. 

Furthermore, some answers to Questions 2 and 3 are also provided in [I]. It 
is shown that U*U is well-conditioned whenever the number of samples is large 
enough [1, Thms. 5.1, 6.2]: 

Theorem 2.1. Assume that i.i.d. random variables uniformly dis- 

tributed on [0, l] d . Let U be the associated random Fourier matrix defined in Ii2.1\) . 
Let e G (0, 1). There exist positive constants A,B depending only on D = \T\ such 
that the event 

1 - e < \ mhl (n- l U*U) < \ m U^ l U*U) <l + s 
has probability at least 

In particular, with probability not less than \2. 3\) . the condition number of U*U is 
bounded by (1 + e)/(l —e). 

A careful analysis of the constants A and B in (12.31) reveals that the number of 
samples required in (12.31) to guarantee a probability > 1 — 5 is 

n > CD 2 \nD, (2.4) 

where C depends on S and e. If T = {— m, . . . ,m} d (trigonometric polynomials of 
degree m in d variables), then D = (2m + l) d , and this bound on the number of 
samples is unfortunately too large to be useful at more realistic sample sizes like 
n ~ D or n ~ D In D. 

For the case V = {— m, . . . , m} d a better estimate for the condition number can 
be extracted from [TJ. We work, however, with a slightly different matrix. Given 
the sampling points xi, . . . , x n , we define the Voronoi regions 

V t := {y E [0, l} d : \\y - x t \\ 2 < \\y - x s \\ 2 ,s 7^ t, 1 < s < n}, t = 1, . . . ,n 
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and let w% = \Vt\ be the Lebesgue measure of Vj. We consider the weighted matrix 

T w := U*WU 

where W is the diagonal matrix with the weights w t , t — 1, . . . , n, on the diagonal. 
Note that a is also the solution of T w a = U*Wy. The following result is implicit 

in m. 

Theorem 2.2. Let T = {—m, . . . ,m} d , i.e., we consider trigonometric polynomials 
of d variables of degree m. Suppose that i.i.d random variables which 

are uniformly distributed on [0, l] d . Choose 7 e (0, 1). // 

/2nd\ d , [{2nd\ d m d \ , . 

n > — m d ln — — — , 2.5 

" V7ln2; \\l\n2j 6 J K J 

then with probability at least I— 6 the condition number ofT w is bounded by (1 — 2 7_1 ) 

Proof: By combining a deterministic estimate with a probabilistic covering result, 
the following estimate was derived in [Tj Thm. 4.2]: Let JV 6 Nbe arbitrary; then 
with probability at least 1 — N d e~ n ' N we have 

(2 - e 2 ™^) 2 < A min (T") < X max (T w ) < 4 . 

For the condition number to be bounded by 4 (2 — 2 7 ) 2 with probability at least 
1 — 5, we need that 

27imd/N < 7 In 2 and N d e~ n/Nd < 5 . 
By solving for n, we find that n must satisfy the inequality (12.51) . ■ 

Since D — \Y\ = \{—m, —m + 1, . . . , m — 1, m} d \ = (2m + l) d , Theorem 12.21 
becomes effective for 

n w (nd) d D\n ((nd) d D) . (2.6) 

Thus Theorem 12.21 is a genuine improvement over ( 12. 4p for fixed value of d. The 
dependence n ~ DlnD on the dimension of the function space seems to be of the 
correct order. However, the constant (7id/'j) d depends strongly on the number of 
variables d, and so Theorem 12.21 does not escape the curse of dimensionality. 

In Theorem 14.11 we will prove a much better result for the condition number of 
U*U where the constants do neither depend on d nor on the special form of the 
spectrum T. See also Corollary 13.51 

3 Exponential Inequalities For Sample Second Mo- 
ment Matrices 

In this section we abstract from the concrete form of U as given in (12. ip and consider 
arbitrary complex random matrices with independent rows satisfying some regularity 
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conditions. Apart from being of interest in its own, this more general setting allows 
one to study random sampling not only for trigonometric polynomials but also for 
more general types of finite-dimensional function spaces, such as random sampling 
of algebraic polynomials on domains, or of spaces of spherical harmonics on the 
sphere (see [TJ Sect. 6] for a list of examples). 

3.1 The I.I.D. Case 

We assume first that the random matrix U G C nxD has independent identically 
distributed rows and delay the discussion of the case of independent, but not iden- 
tically distributed rows to Section I3~2l Furthermore, we assume that the rows u t . = 
(ua, . . . , Uto) of U satisfy the following condition: The moment generating functions 
of the random variables Re(ui k u\j) and lm(ui k u\j) exist for all 1 < k,j < D; i.e., 
there exists x > such that for all 1 < k, j < D 

E [eKjp(xRe(u^uij))] < oo, E[exp(xlm(uy^uij))] < oo (3.1) 

hold for all x < Xq. Note that a sufficient condition for (13.11) is that the moment 
generating function of |wifc| 2 + \uij\ 2 exists for all k,j. Further, we let 

Q := E««i.) e C DxD 

with entries q k j. We note that by the strong law of large numbers rT x U*XJ converges 
to Q — E[n~ 1 Lf*U] almost surely. 

Assumption (13.1 j) is easily seen to be equivalent to the existence of finite constants 
M > and v kj > such that for all £ > 2 

E [\ Re(u^ Ulj - q kj )\ e ] < TH\ M e ' 2 v kj , (3.2) 
E [| Im(TZifctty - q kj ) \ £ ] < TH\ M^ 2 v kj (3.3) 

hold for all 1 < k,j < D. For a generalization leading to a slightly better, but more 
complex bound see Section 13.21 

Remark 3.1. If the random variables u\ k are bounded, i.e., 

| Re(u^uij - q kj )\ < C and | ha(u^uij - q kj ) \ < C 

holds with probability 1 for all 1 < k, j < D, then and Q hold with M = C/3 
and 

v kj = max{E [(Re(u^uij - q kj )) 2 ] , E [(lm(u^uxj - qkj)) 2 ] }■ (3.4) 

This claim is obvious for i = 2. For i > 3 it follows from a general inequality for 
arbitrary real- valued bounded random variables X: 

E[\X\ l ] = E[X 2 |At" 2 ] < ^~ 2 E[X 2 ] = H^cx 2 < H-^—^ct 2 , 

Ks • 2*3 

where a 2 = E[X 2 ] and \X\ < C holds with probability 1. In particular, this shows 
that the random Fourier matrix given in (12.11) satisfies (13.21) and (13. 3p . □ 
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The proof of the main result in this section will make use of the following 
Bernstein-type inequality for unbounded random variables given in Bennett [21 
eq. (7)], see also [TBI Lemma 2.2.11]: 

Let Xi, . . . , X n be independent real-valued random variables with zero mean such 
that E\X t \ l < i\M l ~ 2 v t /2 holds for every i > 2 and t = l,...,n for some finite 
constants M > and v t > 0. Then for every x > 



P 



t=i 



>x) <2e~ x -^ Vt+Mx )~ , (3.5) 



with the convention that the right-hand side in Ii3. 5\) is zero if M = and Y^it=i Vt = 
0. ^ 

Note that Bennett [2] assumes Ym=i v t > but the inequality ( 13 .5[) trivially also 
holds for Ylt=i ^ = in which case the probability on the left-hand side is zero. 
Inequality (13.51) . and hence the subsequent results, can be somewhat improved, see 
Bennett [2j eq. (7a)]. Since this does not result in any significant gain, we do not 
give the details. 

Set 

v := max v^. 

l<k,j<D 

Note that neither v nor M depend on n because the rows are identically distributed. 
However, they depend on the distribution of the random vector u\. and hence may 
depend on D. Our main result now reads as follows. 

Theorem 3.1. Assume that the rows u\., . . . ,u n . of U are i.i.d. random vectors in 
C D whose entries satisfy the moment bounds $3. ~E) and ( fff. 3\) . Then, for every e > 0, 
the operator norm satisfies 

Wn^WU -Q\\ < e 

with probability at least 

1 - AD 2 exp [ ^= | . (3.6) 

\ D 2 (4v + 2 v / 2D- 1 Me) J v ' 

In particular , with probability not less than \3. 6\) the extremal eigenvalues ofn~ 1 U*U 
satisfy 

KUQ) - e < A min (n- 1 C/*C7) < A max (n _1 f/*f/) < A max (Q) + e . (3.7) 

Consequently, if Q is non-singular and e G (0, A min (Q)), then the condition number 
of U*U is bounded by ^ maX (g]l~g with probability not less than $3.6)) . 

In connection with (13.71) we note that X m i n (n^ 1 U*U) > holds trivially, since 
the matrix n~ l U*U is nonnegative definite. 

Proof: We first note that inequality (13.71) for the extremal eigenvalues of n^ 1 U*U 
follows from the inequality ||n _1 ?7*?7 — Q\\ < e for the operator norm. Hence, it 
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suffices to concentrate on the operator norm, which we majorize with Schur's test 
by using that \\A\\ < max*,. ^ ■ \ctkj\ for self-adjoint A. In this way we obtain that 



D 



^E p E 

fe=i \j=i 



t=i 



n- l U*U - Oil > e) < P max V 

n 

n' 1 22(utkU t j - qkj 
1 

n 

'E 

n 

n 

2j Re(u^u tj - qkj 



n 



D 



> e 



1 y^A u tk u tj - ^ 

_1 ^{WkUtj - qkj 



t=i 



fcj=i 



= E p 

fc,i=i 

<E p 



" > - q k j) 

n 

L 

t=i 



t=i 



+ 



ne 



n 



t=l 

D 



V2D 



1 lm ( U tkUtj - ?fcj 
n 



(3.f 

2 



fc,i=l 



t=i 



> 



ne \ 

7Td) ' 



For each index k,j the inequality (13.51) gives 



P 



Re(u t kU t j - qkj) 



t=i 



> 



ne 

7Tb 



< 2exp 



ne 



D 2 (Av kj + 2V2D~ 1 Me) 



(3.9) 



and similarly for the imaginary part. Hence, we finally obtain 

D 

P {Wn^WU - Q\\ > e) < 4 ^ exp 

k,j=i 

,2 



ne 



D 2 (Av kj + 2V2D~ 1 Me) 



< AD Z exp 



ne 



D 2 (4v + 2V2D- 1 Me) 
with v as defined above. Thus (13. 6ft follows. 



(3.10) 



Remark 3.2. If «x. possesses an absolutely continuous distribution then Q is auto- 
matically non-singular. More generally, this holds as long as the distribution of U\. 
is not concentrated on a (D — l)-dimensional linear subspace of C D . To see this, 
consider the quadratic forms z*Qz for z G C D and note that 

z*Qz = E[z*ul.u h z) = E[\ Ul .z\ 2 ] > 0. 

Hence, if z*Qz = 0, then \ui.z\ = with probability 1; thus the distribution of u\. 
would have to reside in the orthogonal complement of the one-dimensional subspace 
spanned by z* . □ 
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Remark 3.3. For real-valued random matrices U we can improve the probability 
bound (I3.6P to 

.2 



1 - 2D 2 exp 



ne 



2D 2 (v + %f) 



A similar improvement for real-valued U applies to the subsequent corollary and 
remark as well as to the results in Section 13.21 □ 

Corollary 3.2. Assume that the rows ui., . . . , u n . of U are i.i.d. random vectors in 
C D that are bounded, i.e., 

\Re(u^u lj - q kj )\ < C and \lm(u^Uij - q kj )\ < C 

holds with probability 1 for or all 1 < k, j < D . Let 

b : = max {E [(Ref/ul^- - q k j)f] , E [(Im(«ifc«y - qkj)) 2 ] } ■ 

k,j=l,...,D 

Then the conclusions of Theorem \3.1\ hold and ( Iff. 6]) becomes 

1- AD 2 exp I ^= -|. (3.11) 

y D 2 (46 + 2V2D- 1 Ce/3) J V ' 

Proof: By Remark 13.11 conditions (13.21) and (I3.3P hold with M = C/3 and v k j as 
in (13.41) . Then the statement follows from Theorem 13.11 ■ 

Remark 3.4. Corollary 13.21 can also be derived by using the classical Bernstein in- 
equality instead of inequality (13. 5p in the proof of Theorem 13.11 Furthermore, the 
bound in (13. lip can be somewhat improved by using an improved form of Bernstein's 
inequality eq. (8)] (see also [5J Corollary A. 2]) for bounded random variables in- 
stead of (13.51) in that step: If we use that inequality in the estimate (13.91) . we arrive 
at the following improved bound (provided C > 0, 6 > 0): 



1 - 4D 2 exp I -C- 2 nb ( ( 1 + In ( 1 



Ce \ Ce \\ 
V2Db J V2Db J J ' 



V2DbJ 

□ 

Let us now apply our findings to random sampling of trigonometric polynomials. 

Corollary 3.3. Let x%, . . . , x n be independent random variables uniformly distributed 
on [0, l] d . Let U be the associated n x D random Fourier matrix Ii2.1\) . Let e > 0. 
Then with probability at least 

1 - 4D(D - 1) exp ( = U£ _ -| (3.12) 

\ 2 ((D — 1) + V2(D - l)e/3) J 

we have 

Wn^TTU-QW < e, (3.13) 
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and hence 

l-e< \ min (n- l U*U) < X^n^lTU) <l + e. (3.14) 

Consequently, forO < e < 1, the condition number ofU*U is bounded by (l+e)/(l — 
e) with probability not less than Ii3.12\) . 

Proof: In this case (n _1 £7*£7)fcj = n - l Y^ =l e 2m ^~ k ">' Xt and consequently Q = I, 
so A m i n ((3) = A max (Q) = 1. [By abuse of notation, k denotes both an element of Y 
and a column index.] Furthermore, Y^=\^tk u tj ~ Qkj) = for k = j. Hence, the 
double sum in the second line of (13.81) only extends over j ^ k and consequently 
e/D can be replaced by e/{D — 1) in the subsequent steps in (I3.8p . Furthermore, 
when deducing (13.101) from the union bound in (13.81) we only have to take into 
account D(D — 1) instead of D 2 summands; cf. also Remark 13.81 below. Moreover, 
| Re(utkU t j — Qkj)\ < 1 f° r ah k,j. For k ^ j we have 

E [Re(uEu tj - q k j) 2 ] = / (Re(exp(27ri(j - k) ■ x)) 2 dx = -, 

J[o,i] d 1 

hence v k j = 1/2. The same holds for the imaginary part. In view of Remark 13. II the 
result follows. ■ 

From the previous result it is easy to determine the minimal number of sampling 
points sufficient to provide a small condition number with high probability. 

Corollary 3.4. Letxi, . . . , x n be independent random variables uniformly distributed 
on [0, l] d . Let U be the associated n x D random Fourier matrix ^2. IS) . Let < e < 
1, < 5 < 1 and suppose 

2( f n2 y/2(D-l)e\ 1 (AD(D-l)\ 
n>-\(D-l) 2 + '- In ^ . (3.15) 



e 2 \ v ' 3 V 5 



Then A3.13]) and ^3.14 ) hold with probability at least 1 — 5. 



We note that (I3.15P is implied by the more compact inequality 

CD 2 ln(D/5) 
n > 3.16 

for an appropriate constant C. We will improve on this result in Corollary 13.51 and 
in Section HI see in particular (14. 2p . 



3.1.1 Comparison With Other Results 

Recently Mendelson and Pajor [9] provide a related exponential inequality for ran- 
dom matrices with i.i.d. real-valued rows. They assume the following properties: 

(a) There exists p > such that for every 6 G M D , ||0|| 2 = 1, (E|(wi., #)| 4 ) 1/4 < 
p < oo. 



10 



(b) Set Z = \\ui. || 2 , then ||^||^ < oo for some a > 1. 

Here the Orlicz norm || • ||^ of a real- valued random variable Y with respect to 
i/) a (x) = exp(x a ) - 1 is defined as ||Y||^ q = inf {C > : Eip a (\Y\/C) < 1}. Note 
that if a > 2, condition (b) is stronger than our assumption (13. ip . hence condition 
(b) implies fl3T2l- fl33|) . 

Under conditions (a) and (b), Mendelson and Pajor [9j Theorem 2.1] show that 
there exists an absolute constant c > such that for every e > the operator norm 
satisfies 



P( \\n- L U*U-Q\\ <e)> l-2exp 



ce 



max{B n , Al} 



(3.17) 



where 



A r . 



A/ln(min(D, n))(\n n) 1 ^ 



n 



B n 



n 



We have added a factor 2 on the right-hand side of (I3.17P to correct a missing 
constant in [9, Theorem 2.1]. Since the constant c is not specified, the value of 
(13.17!) is mainly for asymptotics as n — > oo, whereas the results of Section 3 yield 
estimates with explicit constants for given n. Moreover, the probability estimate 
(I3.17P is only subexponential. For fixed dimension D, the right-hand side of (I3.17P 
is of the order 

1 - 2exp (- Cl n a/(2a+4) (hin)- 1/(a+2) ) , 
which is only subexponential, whereas the bound in (13.61) is exponential of the form 

1 — c 2 exp (— c 3 n) . 

Here c±, C2, and C3 are constants that depend on D. We also note that the proof of 
Theorem 13.11 is quite simple and easily extends to the case of independent but not 
necessarily identically distributed rows as shown in the next section. 

Nevertheless, the result of Mendelson and Pajor [9J Theorem 2.1] can be used to 
improve upon (13.151) in the special case where the set T is symmetric in the sense 
that k G T implies —k G T. 

Corollary 3.5. Letxi, . . . , x n be independent random variables uniformly distributed 
on [0, l] d . Let U be the associated n x D random Fourier matrix Ii2.1\) and assume 
that T is symmetric. Let 0<e<l,0<5<l and suppose 



n > max 



D \nD, 



1 2 



D + VDlnD) 2 } (3.18) 



where c is the absolute constant in \3.11^ . Then Ii3.13\) and l[3.14\ ) hold with proba- 
bility at least 1 — 5. 



11 



Proof: Although it is possible to adapt j9] to complex-valued random matrices, we 
will use the result as stated. 

Since T is symmetric by assumption, we may write it as A U (—A) with A n 
(—A) C {0}. Define the real n x D matrix W by Wtr-k) = v2 cos(2nk ■ x t ) and 
w t k = V^sm(2Tck ■ x t ) for k G A\{0} and set w t o = 1 if G A. Then clearly 
U = WS where S is a unitary D x D matrix and consequently ||n _1 [/*[/ — 7|| = 
||n _1 VPW - To apply fl37T7j) we note that ||u>i.|| 2 = £> 1/2 and hence \\Z\\ i , a = 
I \\ w i- \\2W1p — -D 1//2 (ln2) _1// ° for every a > 1. Furthermore, 



sup 



(E|(wi. 



I4U/4 



< 



sup (E|( Ml .S*,#)| 4 ) 1/4 

||6l||2=l,6»eR D 

sup (E|( Ml .,^)| 4 ) 1/4 
\\e\\ 2 =i,eeR D 

sup (E|(mi. 
||e|| 2 =i,6iec- D 



|4\l/4 



2 2 

Now, since |^ fcgr ^fc exp(27ri/c • xi) I <(Xwcerl^fcl) we obtain 



E|(ui. 



< 




^fc exp(27rzfc • xi^ 

fcer 



This shows that the rows Wp satisfy condition (a) in [9j Theorem 2.1] with p = D 1 ^. 
As a consequence, (13.171) applies to W, and hence to the Fourier matrix U, for every 
a > 1. Since the left-hand side of (13.171) does not depend on a, we may let a — ► 00 
and obtain for n > D the bound 



P(||^ 1 f/*f/-Q|| <e) > l-2exp f-ce min{n 1/2 /(v / ^D + V£> In D), n/(D In D)} 
The probability is not less than 1 — 5 whenever condition (I3.18P holds. 



Comparing ( 13. 18ft with (I3.15p . we have gained on the exponent of D. However, 
the quantity ln(<5 _1 ) now enters quadratically instead of linearly, and an unspecified 
constant appears in the lower bound for n. 



3.2 The Non-LI.D. Case and Other Generalizations 

In this section we generalize the results to the case where the random matrix U G 
C nxD has independent rows which, however, need not be identically distributed. In 
the course of this generalization we also obtain some slight improvements in the case 
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of i.i.d. rows discussed above. Apart from the assumption of independent rows, we 
assume that the matrix U satisfies the following condition: The moment generating 
functions of the random variables Ke(utkUtj) and lm(utkUtj) exist for all 1 < t < n 
and 1 < k,j < D; i.e., there exists x > such that for all 1 < t < n and 
1 < k,j < D 

E [exp(x Re(utk~u t j))} < oo, E [exp(x Imiu^utj))] < oo (3.19) 

holds for all x < xq. Note that xq will depend on the distribution of U and thus 
may depend on n and D. Furthermore, we set 



Q (t) := E«ut.) G C 



DxD 



with entries qj$ and 



Q n : = n~ l Q {t) = ^[n^lTU] G C DxD . (3.20) 
t=i 

As in Section 13.11 assumption (13.191) is seen to be equivalent to the existence of 
te constants M® x > 0, M$ 2 > 0, v® x > 0, vf j2 > 0, such that for alH > 2 

Re(mu tj - q®)\ £ ] < 2~H\ (Mgy-\g, (3.21) 
Imfctoj ~ < 2-^! (Mg) e - 2 vg (3.22) 



E 

E 



hold for all 1 < t < n and 1 < k,j < D. If M^v^ = then we may assume 

without loss of generality that = = 0. 

For fixed n it is always possible to choose the constants on the right-hand side 
of f)3.2ip and fj3.22j) independent of t. However, for n —>■ oo the resulting conditions 
in Theorem 13.61 below would become unnecessarily restrictive in the non-identically 
distributed case. Furthermore, allowing the constants to depend on k,j and to be 
different in (I3.2ip and (I3.22p . provides some extra flexibility which results in an 
improved, albeit more complex bound even in the case of i.i.d. rows. 

Remark 3.5. Condition (13.211) necessarily implies v® x > cr[*ji, where crj^f denotes 
the variance of Ke(u^kUtj — <zjy )■ Furthermore, observe that given condition (I3.2ip is 
satisfied, it is also always satisfied with vj£j X = cr k j X - [This is obvious if a^ji = 0^ an d 
otherwise follows by replacing M^ x with M^ x Vj^ x / a^ji > observing that v$ x / a^ji — 1 
as noted before.] Similar comments apply to condition ( 13.22p . □ 

Remark 3.6. If the random variables Ut k are bounded, i.e., 

!7T., „(*)m ^ n „A iw-tt--. „(*)m ^ Mt) 



Re(u tk u tj -q k p\ < Clj x and \Im(u tk u tj - ql>)\ < C i 



kj2 
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holds with probability 1 for all 1 < t < n, 1 < k,j < D, then fl3~2H and fl3~22|) hold 
with M® = Cg\/3, M« = Cjg,/3, and 



E 



and f 



(t) 

fej2 



E 



(Im(u tk u tj - g^ } )) 2 



This follows exactly as in Remark 13.11 

In order to present the generalization of Theorem 13.11 we introduce 



(3.23) 
□ 



V k j In : = V k]l > V Wn ■ = 



,(*) 



t=l 



t=l 



and 



M fciln = max{M^ : 1 < t < n}, M kj2n = max{M$ 2 :l<t<n}. 



kjlni IVI kj2n 



Furthermore, set v n = max{ffc jln , v k j2n '■ 1 < k, j < D} and M n = max{M; 
1 < k,j < D}. Note that v n and M n depend on the distribution of the random 
matrix U and hence may depend on D. The expression on the right-hand side of 
(I3.24p below is the direct generalization of (13.61) to the non-identically distributed 
case, whereas the bound 1 — \l/ given in (13.271) below is an improvement (even in the 
case of i.i.d. rows). 

Theorem 3.6. Assume that the rows Ui.,...,u n . of U are independent random 
vectors in C D whose entries satisfy the moment bounds 113.21) and 113. 22) . Then, 
for every e > 0, the operator norm satisfies 

\\n- l U*U - Q n \\ < e 



with probability at least 1 — ^ where is defined in j\3.27) below. Furthermore, 



1 - ^ > 1 - 4L> 2 exp 



ne 



D 2 {An- l v n + 2v / 2£>- 1 M n e) 



(3.24) 



In particular, with probability not less than 1—^, the extremal eigenvalues ofn 1 U*U 
satisfy 

A mia (Q„) - e < \ mhl {n- l U*U) < K^in^WU) < A max (Q n ) + e . (3.25) 

Consequently the condition number ofU*U is bounded by ^ max (Q") +£ w ah probability 
not less than 1 — ty, provided that Q n defined in $3. 2(h) is non-singular and e G 

(0, A m i n (Q n )) . 

Proof: Exactly as in the proof of Theorem 13.11 we arrive at 



\n- L U*U-Q n \\ > e) 

D 



<£«• 

k,j=i 



E Re ( 



u tkU tj - ql-, 



t=i 



> 



ne 



D 



V2D 



/ fc,7=l 



E Im ( 



Utk u tj 



Qkj > 



t=i 



(3.26) 

ne 



> 
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Again using inequality (13.51) for each k,j gives 



P 



t=i 



> 



ne 



< 2exp 



ne 



D 2 {An- l v k3ln + 2 v / 2 J D- 1 M fcjln e) 



and similarly for the imaginary part. Hence, we finally obtain P (||n 1 U*U — Q\\ > e) < 
^ where 



2 D 
i=l k,j=l 

< 4L> 2 exp 



ne 



D 2 {An- l v kjin + 2^plD-^M k3in e) 



ne 



(3.27) 



D 2 {An- l v n + 2^2D- l M n e) 



Remark 3.7. A sufficient condition for Q n to be non-singular is that at least one 
of the matrices has this property. The argument in Remark 13.21 shows that 
the latter is the case if the distribution of u t . is not concentrated on a (D — 1)- 
dimensional linear subspace of C D . However, note the possibility that nevertheless 

Amin(Qn) ~ > aS U — > OO. □ 

Remark 3.8. (i) In case the (k, j)-element of n~ l U*U — Q n is zero with probability 
1, the corresponding terms on the right-hand side of (I3.26P are zero and do not 
contribute to the bound in (I3.26p . Due to the independence assumption, the (k,j)- 
element is zero if and only if UtkU t j — q k j = with probability 1 for every t. Hence, 

we may set = v^ 2 = = M® 2 — which shows that the corresponding 

terms in the bound ^ are also automatically zero. However, in this case the bound 
(I3.26P and the subsequent bounds can be improved in that in the (k,j)-th term in 
both sums on the right-hand side of (I3.26P the constant D can be replaced by D k , 
where D k denotes the number of non-zero elements in the k-th row of n~ 1 U*U — Q n . 

(ii) A similar remark applies in the case that some or all elements of n~ 1 U*U — Q n 
are real (or imaginary). Cf. Remark 13.31 



4 Random Sampling of Trigonometric Polynomi- 
als Revisited 

We now return to the special case of sampling trigonometric polynomials on uni- 
formly distributed random points and show how the results in the previous sections 
can be improved. The analysis is based on techniques developed in [TU] for the re- 
covery of sparse trigonometric polynomials from random samples by basis pursuit 
(£i-minimization) and orthogonal matching pursuit. Some of the ideas are inspired 
by the pioneering work of Candes, Romberg and Tao in [3]. 
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Theorem 4.1. Let T C Z d of size \T\ = D and let X\,...,x n be i.i.d. random 
variables that are uniformly distributed on [0, l] d . Let U be the associated random 
Fourier matrix given by A2.1\) . Choose 0<£<l,0<a<£ 2 , and 5 > 0. If 



an 
3D. 



> 



In 



2\ 1 -1 



a 



In 



D 



6(1 -a) 



(4.1) 



then, with probability at least 1 — 5, we have 

\\n- l U*U -l\\ < e 

and hence 

l-e< KUn^U) < A max (n~ 1 f/*f/) < 1 + e. 

Consequently, the condition number of U*U is bounded by j^- with probability > 
1-5. 



For instance, the choice a = e 2 /e gives 



n > 



3De 



In 



D 



+ 2 - ln(e - 1) 



(4.2) 



as a simple sufficient condition. 

Compared (14.21) with (13.151) or (13.161) . we have gained on the exponent in D; 
compared with Theorem 12.21 and (I2.6p . the constants are now independent of the 
dimension d of the state space (= the number of variables); compared with ( 13.181) . 
the term ln^" 1 ) only enters linearly in ( 14. 1ft and ( 14.21) instead of quadratically and 
there is now no restriction on T. Moreover, the constants are explicit and small. 



4.1 Proof of Theorem 14.11 

We introduce the polynomials 

\m/2\ 



F m (z) = S 2 (m,k)z k , mGN, 



(4.3) 



k=l 



where ^2 (to, k) are the associated Stirling numbers of the second kind. These are 
connected to the combinatorics of certain set partitions, and they can be computed 
by means of their exponential generating function, see [121 formula (27), p. 77] or 
Sloane's A008299 in 



m=l 



X 

ml 



exp (z{e x — x — 1)). 



Further, we define 



G m (z) := z- m F m (z). 



(4.4) 



(4.5) 



Using the G m 's, we first establish a more general result from which Theorem 14.11 
will follow. 
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Theorem 4.2. Let F C Z d of size \T\ = D and let X\,...,x n be i.i.d. random 
variables that are uniformly distributed on [0, l] d . Let U be the associated random 
Fourier matrix given by \2. and let e > 0. Then, for every m G N, we have 



\\nr l U*U < e, 

and hence 

l-e< KUn^U) < \ ma ,A n ~ l U*U) < 1 + e, 
uw'i/i probability at least 

l-e- 2m DG 2m (n/D). 

Proof: Again, the estimates for the eigenvalues follow from the inequality \\n^ 1 U*U- 
I\\ < e. Furthermore, since n _1 ?7*?7 — I is self-adjoint, we have for every m G N 

Wn^WU-lW = \\(n- l U*U - I) m \\ l/m < \\{n- l U*U - I)" 1 ^™ , 



where || • \\p denotes the Frobenius norm, \\A\\f = y^Tr (AA*). Consequently, 

F(\\n^U*U - I\\ >e)< P(|| (n _1 E/*£7 - /) m ||F > e m ). 

We now apply Markov's inequality and obtain that 

P(||(n _1 C/*C7 - I) m \\ F > e rn ) < e" 2m E [W^WU - J) m ||^] . 

The latter expectation was studied in pIJJ Section 3.3], see also Lemma 3.3 in [TO] : 
It was shown that 

E[\\(n- l U*U-I) m \\ 2 F ] < DG 2m (n/D), (4.6) 
which concludes the proof. ■ 

We now show how Theorem 14.11 follows from Theorem 14.21 This is done by 
estimating G m and by a diligent choice of the free parameter m. 

We set the oversampling rate to be 9 = n/D. In [TO} Section 3.5] it was shown 
that 

G 2m {6) < (3m/9) l - {3m/e) . 

For given < a < 1 and 6, we choose m = m(6) G N such that (3m(8)/6) < a < 1. 
Note that this is possible since [an/3D\ > 1 follows from the assumptions of the 
theorem. In the following we will take the value 

m(e) = [a9/3\ , 

and obtain that 

G 2m{e) (8) < -. (4.7) 



1 — a 
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In view of Theorem @~2] we want to achieve e- 2m DG 2m (0) < 5. By (@~7D this 
inequality is satisfied if 

De- 2m ^- < 5, 

1 — a 

which is equivalent to 

D 



In (—) m(6) > In (— 
V a / V (1 



.(1 - a)8J' 

Since a < e 2 by assumption, Theorem 14.11 follows. 
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